%%shell
jupyter nbconvert --to html /content/MIE1628_A5.ipynb
Explanation of all 5 boxes:
Azure Data Lake - Azure Data Lake is a scalable data storage service. ADL includes all the capabilities required to make it easy for developers, data scientists, and analysts to store data of any size, shape, and speed, and do all types of processing and analytics across platforms and languages.
Azure Data Bricks - Azure Data Bricks is offered as PaaS by Microsoft which is built on top of the free Apache Spark and supports other languages such as Python etc. ADB allows us spins up clusters instantly and it further integrates with other Azure services such as Azure ML, Big Data, Power BI etc.
Azure Data Factory - Azure Data Factory is the platform that solves such data scenarios. It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.
Azure Synapse Analytics - Azure Synapse Analytics is a limitless analytics service that brings together data integration, enterprise data warehousing and big data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated options – at scale. Azure Synapse brings these worlds together with a unified experience to ingest, explore, prepare, transform, manage and serve data for immediate BI and machine learning needs.
Azure Cosmos DB - Cosmos Database (DB) is a globally distributed, low latency, multi-model ( key-value, graph, document,column) database for managing data at large scales. It is a cloud-based NoSQL database offered as a PaaS (Platform as a Service) from Microsoft Azure. It is a highly available, high throughput, reliable database and is often called a serverless database. Cosmos database contains the Azure Document DB and is available everywhere.
Which blocks will go where ?
Ingest Data -> Azure Data Factory : This allows us to use lift and shift the data to cloud there are few options like copy to blob storage. User can create automated pipelines and create triggers to execute the pipelines as when required. So good example use case it when we get raw data on a daily basis, we can create automated triggers to run and ingest the data.
Data Storage -> Azure Data Storage : As described above, ADL is a scalable data storage service, once ADF is used to ingest the data, we need a place to store data and ADL is perfect place to keep small-large amounts of data. One good benefit of ADL is we can storage different types of data such as text, logs, video, document etc.
Prepare and Transform Data -> Azure DataBricks : One we have ingested raw data, we need to derive meaningful insights from the data and before that might have do some pre-processing such as ETL/ELT, perform data manipulation etc. For these purpose Azure DataBricks is peferred due to its support on Big Data solutions as Pyspark, MLLib etc. Another option to peform ELT/ELT options can be also be done via Azure Data Factory. Hence even Azure Data Factory could be a good fit in these case. However at these stage we would ideally want to perform some visulzations, perform data manipulations, combine data etc and as such Azure DataBricks would be more preffered option in this case.
i ) Azure Synapse Analytics : SaaS offered by MS which provides and end to end datawarehouse capabilities. Synapse is more of a unified world where we can perform ingestion, explorations, ETL/ELT and serve data for Business Intelligence and Machine learning needs. However in this case, Synapse will integrate with Databricks ETL/ELT and provide data insights and ML capabilities.
ii) Azure Cosmos DB : Once we perform all the analytics on the organized data, there might be a requirement that data should stored in a structured way to cater to the real world. For e.g This could be storing of logs of a deployed web application. A developer might be interested in looking at the logs received to monitor the application. Other use cases could be like to insert a new row of data whenever a sensor reading is there and alert the user. The user can check the logs of of sensor reading of last 1hr to understand what has happened. Hence Azure Cosmos DB would be good fit to serve the data to the user in a desriable way.
Azure Stream Analytics is a fully managed stream processing engine that is designed to analyze and process large volumes of streaming data with sub-millisecond latencies. Patterns and relationships can be identified in data that originates from a variety of input sources including applications, devices, sensors, clickstreams, and social media feeds. These patterns can be used to trigger actions and initiate workflows such as creating alerts, feeding information to a reporting tool, or storing transformed data for later use.
An Azure Stream Analytics job consists of an input, query, and an output. It ingests data from Azure Event Hubs, Azure IoT Hub, or Azure Blob Storage.
IOT RESOURCE GROUP DEPLOYED
PART B
Dataset Link - https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Drugs.com%29
Dataset Description -The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.The data was obtained by crawling online pharmaceutical review sites.
Attribute Information -
Problem Statement
To perform EDA, understand general trends on drug usage, conditons suffered by patients etc.
Sentiment Analysis on the reviews
Predict the rating of drug based on the reviews provided.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
df_test=pd.read_csv('drugsComTest_raw.tsv', sep='\t')
df_train=pd.read_csv('drugsComTrain_raw.tsv', sep='\t')
df_train.head()
df_test.head()
df_train.info(verbose=True)
df_test.info(verbose=True)
We already see there are some null values present in the column condition. Let us see how many of them are there and what is % ratio of it for both the test and train data.
df_test.describe()
df_train.describe()
df_train.isnull().any()
df_test.isnull().any()
df_train['condition'].isnull().sum()
df_test['condition'].isnull().sum()
652/114513
295/53766
Both the dataset contain about 0.5% of null values. Hence a total summation of close to 1%. We can do the following things :
Drop the null values
Try to fill in the empty values using histroical data present in dataset and performing a mapping. However this might be not be useful as there could be more than one drug treating multiple conditions and choosing any drug at random or the most common one could introduce bias. However our main goal to predict the rating based on reviews received. Had it been the other away around e.g predict the conditions based on the reviews it would have been different story.
Before preprocessing let us perform some EDA by combining both the datasets to visualize the data.
data=pd.concat([df_train,df_test])
data.shape
The dataset came with data already split ?! I could not find any reason why they have split the dataset without pre-processing even though both have null value and textual data often requires some cleaning to remove stopwords etc to better peform on sentimental analysis and on tf-idf algorithms.
While it is the best practice to keep a test set always pristine where no one has looked into it. I am afraid that if I clean my train data properly and the do not clean the test data equally good. Whatever model I perform might be not be upto the mark. Hence I decided to merge the data and clean it in one go. Mergining in this case also helps me to analyse the all the data records in one go also.
data_con=data['condition'].value_counts()
data_con.shape
Since it is not possible to plot all 916 features we will plot the first and last 20 features to get some idea.
data_con[0:20].plot.bar(rot=90)
data_con[-20:].plot.bar(rot=90)
### Chart 2 - Distrubution of Ratings
data_ratings=data['rating'].value_counts()
data_ratings
data_ratings.plot.pie()
#Comments on Chart 2 : Distribution of ratings
### Chart 3 Top 20 drugs with rating 10/10 and Top 20 drugs with rating 1/10
top_drugs=data.loc[data.rating==10,"drugName"].value_counts()
top_drugs[:20].plot.bar(rot=90)
bottom_drugs=data.loc[data.rating==1,"drugName"].value_counts()
bottom_drugs[:20].plot.bar(rot=90)
### Plot 4 on time-series data
pd_dt=pd.to_datetime(data['date'])
pd_dt
pd_dt_counts=pd_dt.value_counts()
pd_dt_counts
#year plot
yr=pd_dt.dt.year
yr.value_counts().plot.bar(rot=90)
#month plot
mt=pd_dt.dt.month
mt.value_counts().plot.bar(rot=90)
data['year']=yr
data.groupby('year')['condition'].nunique().plot.bar(rot=90)
data.groupby('year')['drugName'].nunique().plot.bar(rot=90)
2008 has the least number of reviews and over the period the number of reviews increased and recorded the maximum at the year 2016.
Since this dataset was generated by crawling online pharma websites, one can understand that 2008 was the early stages of internet boom.
In the month wise distribution decemeber has the least reviews, where Aug records the maximum records. All other months has equal distributions.
Number of conditions were periodically increasing and decreasing till 2014, after which there was a rise till 2017. Does the number of conditions increase lead to rise in drugs also ?
Yes, the number of unquie drugs names over the years follows the exact same pattern as that of conditions
data.groupby('condition')['drugName'].nunique().sort_values(ascending=False).head(50)
data.groupby('condition')['drugName'].nunique().sort_values(ascending=False)[0:20].plot.bar(rot=90)
pd.set_option('display.max_colwidth', None)
Removing Null values first
data.isnull().any()
data.shape
# Dropping the data points with null values as it's very much less than 1% of the whole dataset
data = data.dropna(how = 'any', axis = 0)
print ("The shape of the dataset after null values removal :", data.shape)
Removing the rows with "span" data.
span_data = data[data['condition'].str.contains('</span>',case=False,regex=True) == True]
print('Number of rows with </span> values : ', len(span_data))
noisy_data_ = 100 * (len(span_data)/data.shape[0])
print('Total percent of noisy data {} % '.format(noisy_data_))
data.drop(span_data.index, axis = 0, inplace=True)
data.shape
Removing the 'not listed/other conditions'
#check the percentage of 'not listed / othe' conditions
not_listed = data[data['condition'].str.contains('Not Listed / Othe', case=False, regex=True)==True]
print('Number of not_listed values : ', len(not_listed))
percent_not_listed = 100 * len(not_listed)/data.shape[0]
print('Total percent of noisy data {} % '.format(percent_not_listed))
#check the percentage of 'not listed / othe' conditions
not_listed = data[data['condition']=='Not Listed / Othe']
print('Number of not_listed values : ', len(not_listed))
percent_not_listed = 100 * len(not_listed)/data.shape[0]
print('Total percent of noisy data {} % '.format(percent_not_listed))
data.drop(not_listed.index, axis = 0, inplace=True)
data.shape
#quickly checking once again if there are any changes.
data.groupby('condition')['drugName'].nunique().sort_values(ascending=False).head(100)
print("Total loss in data is ", (215063-211247)/215063)
Textual Data Cleaning
Steps for reviews pre-processing.
pip install nltk
data
#import the libraries for pre-processing
from bs4 import BeautifulSoup
import nltk
nltk.download('stopwords')
import re
from nltk.corpus import stopwords
from nltk.stem.snowball import SnowballStemmer
stops = set(stopwords.words('english')) #english stopwords
stemmer = SnowballStemmer('english') #SnowballStemmer
def review_to_words(raw_review):
# 1. Delete HTML
review_text = BeautifulSoup(raw_review, 'html.parser').get_text()
# 2. Remove all other letters apart from ASCII English letter ( basically remove all digits, puncuations etc.)
letters_only = re.sub('[^a-zA-Z]', ' ', review_text)
# 3. lower letters
words = letters_only.lower().split()
# 5. Stopwords
meaningful_words = [w for w in words if not w in stops]
# 6. Stemming
stemming_words = [stemmer.stem(w) for w in meaningful_words]
# 7. space join words
return( ' '.join(stemming_words))
data['review_clean']=data['review'].apply(review_to_words)
data.head(10)
Comments after cleaning the Textual data
from textblob import TextBlob
sentiment_polarity=[]
for review in data['review_clean']:
blob=TextBlob(review)
sentiment_polarity+=[blob.sentiment.polarity]
data['sentiment_polarity']=sentiment_polarity
textblob_dist=pd.DataFrame({'values':[np.sum(data['sentiment_polarity']>0),np.sum(data['sentiment_polarity']<0),np.sum(data['sentiment_polarity']==0)]},index=['positive','negative','neutral'])
textblob_dist.plot.pie(y='values')
print(textblob_dist)
data.head(10)
pip install vadersentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
v_intensity=SentimentIntensityAnalyzer()
v_scores=[]
for review in data['review_clean']:
compound_value=v_intensity.polarity_scores(review)
v_scores.append(compound_value['compound'])
data['vader_intensity']=v_scores
data.head(10)
sent_dist=pd.DataFrame({'values':[np.sum(data['vader_intensity']>0),np.sum(data['vader_intensity']<0),np.sum(data['vader_intensity']==0)]},index=['positive','negative','neutral'])
sent_dist.plot.pie(y='values')
print(sent_dist)
Vander Sentimenet Analysis classifies more negative comments than textblob and in general practice it has been found that Vader handles negative polaity much better than TextBlob. Let us further analyse how they work with some general medical reviews
print(v_intensity.polarity_scores("Taking this medicine had side effects such as headache"))
print(TextBlob("Taking this medicine had side effects such as headache").sentiment.polarity)
This is complex interpretation from a human prespective also
In reality the medicine might have some side effects irrepsective of person, age etc. So this behaviour is expected.
Having side effects of a medicine usually treated in a negative way as the patient is informed beforehand of these effects and people do not desire side effects of a medication.
Overall in my opinion side effects of medicine should be treated in negative way. However both give out neutral results.
print(v_intensity.polarity_scores("Taking this medicine cured my illness"))
print(TextBlob("Taking this medicine cured my illness").sentiment.polarity)
This is where the true problem arises, "cured my illness" is something positive. However more importance is given to illness word assigning a negative score.
TextBlob perform somewhat better however the score is still not good.
print(v_intensity.polarity_scores("I have Diarrhea and taking this medicine had no effect"))
print(TextBlob("I have Diarrhea and taking this medicine had no effect").sentiment.polarity)
A neutral sentence where TextBlob performed better.
print(v_intensity.polarity_scores("I have Diarrhea and taking this medicine made it more worse"))
print(TextBlob("I have Diarrhea and taking this medicine made it more worse").sentiment.polarity)
A negative review where Vader perfomed better.
print(v_intensity.polarity_scores("After taking Levonorgestrel, I had migraine and vomitigs"))
print(TextBlob("After taking Levonorgestrel, I had migraine and vomitigs").sentiment.polarity)
Not a reliable results let us modify this sentence slightly and see what do we get.
print(v_intensity.polarity_scores("After taking Levonorgestrel, I had migraine and vomitigs hence I would not recommend it to anyone suffering from cough"))
print(TextBlob("After taking Levonorgestrel,I had migraine and vomitigs hence I would not recommend it to anyone suffering from cough").sentiment.polarity)
Vader outperforms and assigns correctly.
text="""Suboxone has completely turned my life around. I feel healthier, I'm excelling at my job and I always have money in my pocket and my savings account. I had none of those before Suboxone and spent years abusing oxycontin. My paycheck was already spent by the time I got it and I started resorting to scheming and stealing to fund my addiction. All that is history. If you're ready to stop, there's a good chance that suboxone will put you on the path of great life again. I have found the side-effects to be minimal compared to oxycontin. I'm actually sleeping better. Slight constipation is about it for me. It truly is amazing. The cost pales in comparison to what I spent on oxycontin."""
print(v_intensity.polarity_scores(text))
print(TextBlob(text).sentiment.polarity)
Vader for the win in the above example. This example is taken from the review without doing pre-processing. While the pre-processing score is 0.89
Comments on Sentiment Analysis
Naturally it was expected that those with rating >7 or 8 would have strongly positive reviews, those with rating 4-5 would be netural and < 4 would be strongly negative.
Some things to note here are :
We are dealing with medical textual data and in these scenarios vader & textblob both might lack some predefined rules for medical terms. For example some medical conditions such as acne, rashes, shaking of arms and legs ,drug names, weight gain might not be included well within the libraries. While a general human reading the reviews might understand the sentiment it will be really hard to depend on the results based generic words only.
TextBlob and Vader work with general purpose twitter and movie reviews data.
Final Conclusion on Sentiment Analysis
Creating our Target Column
We are going to use the threshold rating of 5 for giving the class, The review will have a positive sentiment (1) if rating > 5 and negative sentiment otherwise.
data['rating_label'] = data['rating'].apply(lambda x: 1 if x > 5 else 0)
data.head(10)
data['rating_label'].value_counts().plot.pie()
data['rating_label'].value_counts()
print("% of class 1 labels are ",148071/data.shape[0])
print("% of class 0 labels are",63176/data.shape[0])
We find that the dataset is highly imbalanced with 70:30 ratio. Even though we can run ML models on these and get somewhat desirable results , it still should be treated as a class imbalance problem and we process further to downsample the majority class.
Downsampling the dataset to make equal distribution
class_1=data[data['rating_label']==1]
class_0=data[data['rating_label']==0]
print(class_1.shape)
print(class_0.shape)
from sklearn.utils import resample
class_1_downsample=resample(class_1,replace=True,n_samples=class_0.shape[0],random_state=42)
print(class_1_downsample.shape)
class_1_downsample
data_downsampled=pd.concat([class_1_downsample,class_0])
print(data_downsampled.shape)
print(data_downsampled['rating_label'].value_counts())
data_downsampled['rating_label'].value_counts().plot.pie()
We have downsampled the majority class to have a equal distribution of data (50:50). We will evaluate the performances of both the datasets ( balanced and unbalanced) in Q4 and check the results.
We cleaned the textual data to get rid of unnecessary common words, remove punctuations etc.
We performed Sentiment Analysis to understand the nature of reviews and the distribution of the reviews
Created our target variable based on threshold of 0.5 for a binary classification.
Checked on the dataset imbalance and found that we had ratio of imbalance of 70:30 ( class 1 : class 0)
Downsampled dataset to bring to 50:50 dataset to remove class imbalance.
We now have 2 datasets, let us now move to Q4 and build the models.
We will primarily use two algos:
Random Forest - This algo hands down works best with many of the classification problems seen in practice. A ensemble classifier based on number of decisions trees.
Naive Bayes- i) quite fast, ii) known to work well with textual data. Though it requires integer features, in pratice can work will tf-idf also.
We can run another supervised classification problems like KNN, SVM , Logisitic Regression, SVC etc . KNN is ofcourse out of bounds for such a large dataset with so many features.
Since we cannot feed raw textual data we will use tf-idf vectorizer to convert our textual data to vectorized format.
from sklearn.model_selection import train_test_split #import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer #import TfidfVectorizer
from sklearn.metrics import confusion_matrix #import confusion_matrix
from sklearn.naive_bayes import MultinomialNB #import MultinomialNB
from sklearn.ensemble import RandomForestClassifier #import RandomForestClassifier
# Creates TF-IDF vectorizer and transforms the corpus
vectorizer = TfidfVectorizer()
reviews_corpus = vectorizer.fit_transform(data.review_clean)
reviews_corpus.shape
#checking how our sparse matrix looks like
reviews_corpus[0:100].toarray()
#not of relevance but just to check that all elements are not zero
np.max(reviews_corpus[0:100].toarray())
sentiment_data=data['rating_label']
sentiment_data.shape
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(reviews_corpus,sentiment_data,test_size=0.30)
print('Train data shape ',X_train.shape,Y_train.shape)
print('Test data shape ',X_test.shape,Y_test.shape)
MNB Model
clf = MultinomialNB().fit(X_train, Y_train) #fit the training data
pred = clf.predict(X_test) #predict the sentiment for test data
print("Accuracy: %s" % str(clf.score(X_test, Y_test))) #check accuracy
print("Confusion Matrix")
print(confusion_matrix(Y_test,pred)) #print confusion matrix
Random Forest Classifier
#fit the model and predicct the output
clf = RandomForestClassifier().fit(X_train, Y_train)
pred = clf.predict(X_test)
print("Accuracy: %s" % str(clf.score(X_test, Y_test)))
print("Confusion Matrix")
print(confusion_matrix(Y_test, pred))
from sklearn.metrics import *
print('f1 score is',f1_score(Y_test,pred))
print('recall score is',recall_score(Y_test,pred))
print('precision score is',precision_score(Y_test,pred))
Results with using downsampled dataset
# Creates TF-IDF vectorizer and transforms the corpus
vectorizer_downsampled = TfidfVectorizer()
reviews_corpus_downsampled = vectorizer.fit_transform(data_downsampled.review_clean)
reviews_corpus_downsampled.shape
sentiment_data_downsampled=data_downsampled['rating_label']
sentiment_data_downsampled.shape
X_train_down,X_test_down,Y_train_down,Y_test_down = train_test_split(reviews_corpus_downsampled,sentiment_data_downsampled,test_size=0.30)
print('Train data shape ',X_train_down.shape,Y_train_down.shape)
print('Test data shape ',X_test_down.shape,Y_test_down.shape)
clf_down = MultinomialNB().fit(X_train_down, Y_train_down) #fit the training data
pred_down = clf_down.predict(X_test_down) #predict the sentiment for test data
print("Accuracy: %s" % str(clf_down.score(X_test_down, Y_test_down))) #check accuracy
print("Confusion Matrix")
print(confusion_matrix(Y_test_down, pred_down)) #print confusion matrix
#fit the model and predict the output
clf_rf_down = RandomForestClassifier().fit(X_train_down, Y_train_down)
pred_rf_down = clf_rf_down.predict(X_test_down)
print("Accuracy: %s" % str(clf_rf_down.score(X_test_down, Y_test_down)))
print("Confusion Matrix")
print(confusion_matrix(Y_test_down,pred_rf_down,))
print('f1 score is',f1_score(Y_test_down,pred_rf_down))
print('recall score is',recall_score(Y_test_down,pred_rf_down))
print('precision score is',precision_score(Y_test_down,pred_rf_down))
We necessarily do not improve our model performace even if we have balanced the dataset. One obvious thing is that when we downsampled we have lost some import data our review_corpus changed shaped from 36k to 27k after doing tf-idf.
Our objective : Predict rating based on the review data.
If we were to solely choose based on f1-score then RandomForest trained on unbalanced works better than the balanced set. Had it been to predict condition based on review data we would have worked more towards having a higher recall (in my opinion). However we will stick with a balanced data for AutoML also.
data={'clean_review':data_downsampled['review_clean'],'target':data_downsampled['rating_label']}
data_clean=pd.DataFrame(data=data)
data_clean.head()
data_clean.shape
data_clean.to_csv('uci_data_review_clean.csv')
Q5. Use Automated ML for your data set. Explain best model results.
A total of 30 models were run by Azure ML with train: 60%, validation: 10%, test : 30%. The highest accuracy score was reached by Logisitic Regression using MaxAbsScaler Pre-processing.
The ROC plot is indicates a good fit of our model.
Confusion Matrix depicted above on the test set.
Scores are sligthly above the the RandomForest classifier but are within comporable range. It is not clear to me why it was giving micro, macro, wieghted even though our target is just a binary classification and not a multiclass classification. I could not find any option for binary in the ML studio at all even in the official documentation with a binary classification task https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-first-experiment-automated-ml auc_weighted was chosen.
Based on defination and considering the fact our dataset is balanced, I think it is better to take macro scores into consideration as it just averages for classes.
Screenshot giving the hyperparameters used by Aut oML. Again I am not sure why the AutoML used "multinomial" however I understand this is more treated for cross-entropy loss.
So azure relied of Tf-idf also. However it has done some more pre-processing as the sparse matrix has 76468 columns and used MaxAbsScaler to scale the sparse matrix.
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MaxAbsScaler
#fit the model and predict the output
max_abs_scaler=MaxAbsScaler()
X_train_maxabs=max_abs_scaler.fit_transform(X_train_down)
X_test_maxabs=max_abs_scaler.transform(X_test_down)
lr=LogisticRegression(penalty="l2",C=6866.488450042998,class_weight="balanced",multi_class="multinomial",solver="saga")
clf_lr = lr.fit(X_train_maxabs,Y_train_down)
pred_rf_down = clf_lr.predict(X_test_maxabs)
print("Accuracy: %s" % str(clf_lr.score(X_test_maxabs, Y_test_down)))
print("Confusion Matrix")
print(confusion_matrix(Y_test_down,pred_rf_down,))
from sklearn.metrics import *
print('f1 score is',f1_score(Y_test_down,pred_rf_down))
print('recall score is',recall_score(Y_test_down,pred_rf_down))
print('precision score is',precision_score(Y_test_down,pred_rf_down))
Logistic Regression performs much better than Naive Bayes done in Q4. The scores metrics are lower than random forest but the significant digits are comparable. We cannot compare this directly with Azure ML as the Azure has transformed to 764688 features whereas our corpus shape is just 27626 which is less than 50% sparse matrix values.
Finally we have two models in comparison :
Our balanced datasetwith sparse matrix of 27626 columns giving best results with RandomForest with accuracy of 0.88
Balanced dataset fed to Azure AutoML generated a sparse matrix of 76468 columns giving best results with LogisiticRegression with accuracy of 0.89.
Both provide near comparision results though Logisitic Regression has much better scores (f1,recall,precision) compared to randomforest
What more can be done ?